Learning Paraphrasing for Multiword Expressions

نویسندگان

  • Seid Muhie Yimam
  • Héctor Martínez Alonso
  • Martin Riedl
  • Christian Biemann
چکیده

In this paper, we investigate the impact of context for the paraphrase ranking task, comparing and quantifying results for multi-word expressions and single words. We focus on systematic integration of existing paraphrase resources to produce paraphrase candidates and later ask human annotators to judge paraphrasability in context. We first conduct a paraphrase-scoring annotation task with and without context for targets that are i) singleand multi-word expressions ii) verbs and nouns. We quantify how differently annotators score paraphrases when context information is provided. Furthermore, we report on experiments with automatic paraphrase ranking. If we regard the problem as a binary classification task, we obtain an F1–score of 81.56% and 79.87% for multi-word expressions and single words resp. using kNN classifier. Approaching the problem as a learning-to-rank task, we attain MAP scores up to 87.14% and 91.58% for multiword expressions and single words resp. using LambdaMART, thus yielding highquality contextualized paraphrased selection. Further, we provide the first dataset with paraphrase judgments for multi-word targets in context.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

Currently available alignment tools and procedures for marking-up alignments overlook non-contiguous multiword units for being too complex within the bounds of the proposed alignment methodologies. This paper presents the CLUE-Aligner (Cross-Language Unit Elicitation Aligner), a web alignment tool designed for manual annotation of pairs of paraphrastic and translation units, representing both c...

متن کامل

Paraphrasing of Synonyms for a Fine-grained Data Representation

The paper addressed the question how the paraphrasing of synonyms can be linked with a fine-gained ontology based data representation. Our challenge is to identify for a set of synonyms (including terms and multiword expressions) the best lexical paraphrases suitable for given contexts. Our hypothesis is that: i. the minimal context in which the paraphrasing can be validated is different for di...

متن کامل

Automatic Extraction of Fixed Multiword Expressions

Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their s...

متن کامل

Never-Ending Multiword Expressions Learning

This paper introduces NEMWEL, a system that performs Never-Ending MultiWord Expressions Learning. Instead of using a static corpus and classifier, NEMWEL applies supervised learning on automatically crawled news texts. Moreover, it uses its own results to periodically retrain the classifier, bootstrapping on its own results. In addition to a detailed description of the system’s architecture and...

متن کامل

Extracting Transfer Rules for Multiword Expressions from Parallel Corpora

This paper presents a procedure for extracting transfer rules for multiword expressions from parallel corpora for use in a rule based Japanese-English MT system. We show that adding the multi-word rules improves translation quality and sketch ideas for learning more such rules.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016